From Lexical Semantics to Text Analysis

نویسنده

  • Sabine Bergler
چکیده

1 Motivation One of the major challenges today is coping with an overabundance of potentially important information. With newspapers such as the Wall Street Journal available electronically as a large text data base, the analysis of natural language texts for the purpose of information retrieval has found renewed interest. Knowledge extraction and knowledge detection in large text databases are challenging problems, most recently under investigation in the TIPSTER projects funded by DARPA, the U.S. Department of Defense research funding agency. Traditionally, the parameters in the task of information retrieval are the style of analysis (statistical or linguistic), the domain of interest (TIPSTER, for instance, focuses on news concerning micro-chip design and joint ventures), the task ((lling database entries, question answering, etc.), and the representation formalism (templates, Horn clauses, KL-ONE, etc.). It is the premise of this paper that much more detailed information can be gleaned from a careful linguistic analysis than from a statistical analysis. Moreover, a successful linguistic analysis provides more reliable data, as we hope to illustrate with this paper. The problem is, however, that linguistic analysis is very costly and that systems that perform complete, reliable analysis of newspaper articles do not currently exist. The challenge then is to nd ways to do linguistic analysis when it is possible and to the extent that it is feasible. We claim that a promising approach is to perform a careful linguistic preprocessing of the texts, representing linguistically encoded information in a task independent, faithful, and reusable representation scheme. We propose a representation scheme, MTR 1 (for Minimal Text Representation), that does not constitute a text interpretation (nor does it \extract" or \detect" any particular information) but rather forms a common intermediate representation that must be further processed with the particular domain, task, and representation formalism in mind. The beneet of an intermediate representation at the level of MTR is that certain computationally expensive linguistic analyses do not have to be reduplicated for diierent tasks. The introduction of an intermediate representation (that has to be further evaluated) that also supports partial representation and thus incremental analysis of texts enables the 1 MTR is described in more detail in Bergler, 1992]. Here, we will only describe one aspect, proole structure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Stylistic Analysis of Lexicon in Ray Bradbury’s The Martian Chronicles

Ray Bradbury’s The Martian Chronicles is a futuristic, science fiction novel that chronicles the colonization of Mars by humans, projecting the United States’ colonial and immigrant past on to a symbolic future. Bradbury’s use of language is mostly picturesque and sensory. The present paper applies a text-oriented analysis of stylistic elements that construct meaning in the text and evoke the n...

متن کامل

L2 Learners’ Lexical Inferencing: Perceptual Learning Style Preferences, Strategy Use, Density of Text, and Parts of Speech as Possible Predictors

This study was intended first to categorize the L2 learners in terms of their learning style preferences and second to investigate if their learning preferences are related to lexical inferencing. Moreover, strategies used for lexical inferencing and text related issues of text density and parts of speech were studied to determine their moderating effects and the best predictors of lexical infe...

متن کامل

Lexical Semantics of Adjectives A Microtheory of Adjectival Meaning

This work belongs to a family of research efforts, called microtheories and aimed at describing the static meaning of all lexical categories in several languages in the framework of the MikroKosmos project on computational semantics. The latter also involves other static microtheories describing world knowledge and syntax-semantics mapping as well as dynamic microtheories connected with the act...

متن کامل

What is in a text, what isn't, and what this has to do with lexical semantics

This paper queries which aspects of lexical semantics can reasonably be expected to be modelled by corpus-based theories such as distributional semantics or techniques such as ontology extraction. We argue that a full lexical semantics theory must take into account the extensional potential of words. We investigate to which extent corpora provide the necessary data to model this information and...

متن کامل

The role of Persian causative markers in the acquisition of English causative verbs

     This project investigates the relationship between lexical semantics and causative morphology in the acquisition of causative/inchoative-related verbs in English as a foreign language by Iranian speakers. Results of translation and picture judgment task show although L2 learners have largely acquired the correct lexico-syntactic classification of verbs in English, they were constrained by ...

متن کامل

Native and Non-native Use of Lexical Bundles in Discussion Section of Political Science Articles

The study of lexical bundles, among types of text analysis, is gaining importance over the others in the last century. The present study employed a frequency-based analysis approach to the use of lexical bundles. The discussion section of 60 political science articles, with corpora around 253,063 words were investigated in three aspects of structure, form, and function of lexical bundles. The p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994